Active learning for e-rulemaking: public comment categorization
نویسندگان
چکیده
We address the e-rulemaking problem of reducing the manual labor required to analyze public comment sets. In current and previous work, for example, text categorization techniques have been used to speed up the comment analysis phase of e-rulemaking — by classifying sentences automatically, according to the rule-specific issues [2] or general topics that they address[7, 8]. Manually annotated data, however, is still required to train the supervised inductive learning algorithms that perform the categorization. This paper, therefore, investigates the application of active learning methods for public comment categorization: we develop two new, general-purpose, active learning techniques to selectively sample from the available training data for human labeling when building the sentence-level classifiers employed in public comment categorization. Using an e-rulemaking corpus developed for our purposes [2], we compare our methods to the well-known query by committee (QBC) active learning algorithm [5] and to a baseline that randomly selects instances for labeling in each round of active learning. We show that our methods statistically significantly exceed the performance of the random selection active learner and the query by committee (QBC) variation, requiring many fewer training examples to reach the same levels of accuracy on a held-out test set. This provides promising evidence that automated text categorization methods might be used effectively to support public comment analysis.
منابع مشابه
Facilitating Issue Categorization & Analysis in Rulemaking
One task common to all notice-and-comment rulemaking is identifying substantive claims and arguments made in the comments by stakeholders and other members of the public. Extracting and summarizing this material may be helpful to internal decisionmaking; to produce the legally required public explanation of the final rule, it is essential. When comments are lengthy or numerous, natural language...
متن کاملA study in rule-specific issue categorization for e-rulemaking
We address the e-rulemaking problem of categorizing public comments according to the issues that they address. In contrast to previous text categorization research in e-rulemaking [5, 6], and in an attempt to more closely duplicate the comment analysis process in federal agencies, we employ a set of rule-specific categories, each of which corresponds to a significant issue raised in the comment...
متن کاملDigital Government and E-Rulemaking: New Directions for Technology and Regulation
Each year hundreds of federal regulatory agencies issue more than 4,000 new regulations. Before adopting a new regulation, agencies must publish a notice of proposed rulemaking in the Federal Register and allow an opportunity for the public to comment on the proposed rule. They also need to complete scientific, engineering, and economic analyses, as well as respond to comments submitted by outs...
متن کاملProcedural Politicking: Agency Risk Management in the Federal Rulemaking Process
Administrative procedures are often hailed as the solution to managing an unruly bureaucracy, but they are not self-executing. Rather, they must be implemented by the very agencies whose behavior they are designed to constrain. Further, the expert bureaucrats that oversee these processes have superior insight on how these different procedures tend to play out and can use this information to ste...
متن کاملAn eRulemaking Corpus: Identifying Substantive Issues in Public Comments
We describe the creation of a corpus that supports a real-world hierarchical text categorization task in the domain of electronic rulemaking (eRulemaking). Features of the task and of the eRulemaking domain engender both a non-traditional text categorization corpus and a correspondingly difficult machine learning task. Interannotator agreement results are presented for a group of six annotators...
متن کامل